我们介绍了第一个基于学习的可重建性预测指标,以改善使用无人机的大规模3D城市场景获取的视图和路径计划。与以前的启发式方法相反,我们的方法学习了一个模型,该模型明确预测了从一组观点重建3D城市场景的能力。为了使这种模型可训练并同时适用于无人机路径计划,我们在培训期间模拟了基于代理的3D场景重建以设置预测。具体而言,我们设计的神经网络经过训练,可以预测场景的重构性,这是代理几何学的函数,一组观点,以及在飞行中获得的一系列场景图像。为了重建一个新的城市场景,我们首先构建了3D场景代理,然后依靠我们网络的预测重建质量和不确定性度量,基于代理几何形状,以指导无人机路径计划。我们证明,与先前的启发式措施相比,我们的数据驱动的可重建性预测与真实的重建质量更加紧密相关。此外,我们学到的预测变量可以轻松地集成到现有的路径计划中,以产生改进。最后,我们根据学习的可重建性设计了一个新的迭代视图计划框架,并在重建合成场景和真实场景时展示新计划者的卓越性能。
translated by 谷歌翻译
由于低成本的惯性传感器误差积累,行人死的估算是一项具有挑战性的任务。最近的研究表明,深度学习方法可以在处理此问题时获得令人印象深刻的性能。在这封信中,我们使用基于深度学习的速度估计方法提出了惯性的进程。基于RES2NET模块和两个卷积块注意模块的深神经网络被利用,以恢复智能手机的水平速度矢量与原始惯性数据之间的潜在连接。我们的网络仅使用百分之五十的公共惯性探子仪数据集(RONIN)数据进行培训。然后,在Ronin测试数据集和另一个公共惯性探针数据集(OXIOD)上进行了验证。与传统的阶梯长度和基于标题的基于系统的算法相比,我们的方法将绝对翻译误差(ATE)降低了76%-86%。此外,与最先进的深度学习方法(Ronin)相比,我们的方法将其ATE提高了6%-31.4%。
translated by 谷歌翻译
视觉地点识别(VPR)是一个具有挑战性的任务,具有巨大的计算成本与高识别性能之间的不平衡。由于轻质卷积神经网络(CNNS)和局部聚合描述符(VLAD)层向量的火车能力的实用特征提取能力,我们提出了一种由前部组成的轻量级弱监管的端到端神经网络-anded的感知模型称为ghostcnn和学习的VLAD层作为后端。 Ghostcnn基于幽灵模块,这些模块是基于重量的CNN架构。它们可以使用线性操作而不是传统的卷积过程生成冗余特征映射,从而在计算资源和识别准确性之间进行良好的权衡。为了进一步增强我们提出的轻量级模型,我们将扩张的卷曲添加到Ghost模块中,以获取包含更多空间语义信息的功能,提高准确性。最后,在常用的公共基准和我们的私人数据集上进行的丰富实验验证了所提出的神经网络,分别将VGG16-NetVlad的拖鞋和参数减少了99.04%和80.16%。此外,两种模型都达到了类似的准确性。
translated by 谷歌翻译
我们提出了一种基于注意力的新型机制,可以学习用于点云处理任务的增强点特征,例如分类和分割。与先前的作品不同,该作品经过培训以优化预选的一组注意点的权重,我们的方法学会了找到最佳的注意点,以最大程度地提高特定任务的性能,例如点云分类。重要的是,我们主张使用单个注意点来促进语义理解在点特征学习中。具体而言,我们制定了一种新的简单卷积,该卷积结合了输入点及其相应学习的注意点或膝盖的卷积特征。我们的注意机制可以轻松地纳入最新的点云分类和分割网络中。对诸如ModelNet40,ShapenetPart和S3DIS之类的常见基准测试的广泛实验都表明,我们的支持LAP的网络始终优于各自的原始网络,以及其他竞争性替代方案,这些替代方案在我们的膝盖下采用了多个注意力框架。
translated by 谷歌翻译
最近的研究表明,许多发达的视觉问题的答案(VQA)模型受到先前问题的严重影响,这是指基于文本问题和答案之间的共同发生模式来提出预测而不是推理视觉内容。为了解决它,大多数现有方法都侧重于增强视觉特征学习,以减少对VQA模型决策的这种肤浅的快捷方式影响。然而,有限的努力已经致力于为其固有原因提供明确的解释。因此,缺乏以有目的的方式向前迈出前进的良好指导,导致模型构建困惑在克服这种非琐碎问题时。在本文中,我们建议从类 - 不平衡视图中解释VQA中的语言。具体地,我们设计了一种新颖的解释方案,从而在晚期训练阶段明显展出了误差频繁和稀疏答案的丢失。它明确揭示了为什么VQA模型倾向于产生频繁但是明显的错误答案,给出的给定问题,其正确答案在训练集中稀疏。基于此观察,我们进一步开发了一种新的损失重新缩放方法,以基于计算最终损失的训练数据统计来为每个答案分配不同权重。我们将我们的方法应用于三个基线,两个VQA-CP基准数据集的实验结果明显证明了其有效性。此外,我们还可以证明在其他计算机视觉任务上的类别不平衡解释方案的有效性,例如面部识别和图像分类。
translated by 谷歌翻译
In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation -collaborative filtering -on the basis of implicit feedback.Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering -the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items.By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural networkbased Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with non-linearities, we propose to leverage a multi-layer perceptron to learn the user-item interaction function. Extensive experiments on two real-world datasets show significant improvements of our proposed NCF framework over the state-of-the-art methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance.
translated by 谷歌翻译
Visual attention has been successfully applied in structural prediction tasks such as visual captioning and question answering. Existing visual attention models are generally spatial, i.e., the attention is modeled as spatial probabilities that re-weight the last conv-layer feature map of a CNN encoding an input image. However, we argue that such spatial attention does not necessarily conform to the attention mechanism -a dynamic feature extractor that combines contextual fixations over time, as CNN features are naturally spatial, channel-wise and multi-layer. In this paper, we introduce a novel convolutional neural network dubbed SCA-CNN that incorporates Spatial and Channelwise Attentions in a CNN. In the task of image captioning, SCA-CNN dynamically modulates the sentence generation context in multi-layer feature maps, encoding where (i.e., attentive spatial locations at multiple layers) and what (i.e., attentive channels) the visual attention is. We evaluate the proposed SCA-CNN architecture on three benchmark image captioning datasets: Flickr8K, Flickr30K, and MSCOCO. It is consistently observed that SCA-CNN significantly outperforms state-of-the-art visual attention-based image captioning methods.1 Each convolutional layer is optionally followed by a pooling, downsampling, normalization, or a fully connected layer.
translated by 谷歌翻译
In recent years, the exponential proliferation of smart devices with their intelligent applications poses severe challenges on conventional cellular networks. Such challenges can be potentially overcome by integrating communication, computing, caching, and control (i4C) technologies. In this survey, we first give a snapshot of different aspects of the i4C, comprising background, motivation, leading technological enablers, potential applications, and use cases. Next, we describe different models of communication, computing, caching, and control (4C) to lay the foundation of the integration approach. We review current state-of-the-art research efforts related to the i4C, focusing on recent trends of both conventional and artificial intelligence (AI)-based integration approaches. We also highlight the need for intelligence in resources integration. Then, we discuss integration of sensing and communication (ISAC) and classify the integration approaches into various classes. Finally, we propose open challenges and present future research directions for beyond 5G networks, such as 6G.
translated by 谷歌翻译
Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy.
translated by 谷歌翻译
We study the problem of semantic segmentation calibration. For image classification, lots of existing solutions are proposed to alleviate model miscalibration of confidence. However, to date, confidence calibration research on semantic segmentation is still limited. We provide a systematic study on the calibration of semantic segmentation models and propose a simple yet effective approach. First, we find that model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration. Among them, prediction correctness, especially misprediction, is more important to miscalibration due to over-confidence. Next, we propose a simple, unifying, and effective approach, namely selective scaling, by separating correct/incorrect prediction for scaling and more focusing on misprediction logit smoothing. Then, we study popular existing calibration methods and compare them with selective scaling on semantic segmentation calibration. We conduct extensive experiments with a variety of benchmarks on both in-domain and domain-shift calibration, and show that selective scaling consistently outperforms other methods.
translated by 谷歌翻译